Language and Speech Processing Final Project PCFG Parser

نویسنده

  • Roberto Valenti
چکیده

The use of statistical methods in areas such as language processing, speech recognition and grammar learning, switched from being virtually unknown to being a fundamental approach in the last ten years [1]. Thanks to this, we now have available a big number of corpora which we can use to extract statistical information and can help us to understand the underlying structures of the languages. In fact, these data sets are usually annotated and we can use these annotations, together with statistics, in order to spot certain regularities or frequent characteristics of a language. Within the multiple tasks that can be performed with such knowledge, the target of this project is the machine reconstruction of the syntax (parse) of a sentence. We would like to understand which rules generated a sentence, in order to understand which class of sentence we are dealing with. To better explain the task, we can use the noisy channel model [8]: In this model (represented in figure 1), we assume that A has a concept and wants B to receive this concept. In our case this concept is the correct parse for a sentence. B does not receive the full parse, but just the sentence, so he needs to reconstruct the correct parse that A had in mind starting from the observations he gets through the noisy channel (in figure 1, this is represented by the dashes without the parse).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

A Core-Tools Statistical NLP Course

In the fall term of 2004, I taught a new statistical NLP course focusing on core tools and machine-learning algorithms. The course work was organized around four substantial programming assignments in which the students implemented the important parts of several core tools, including language models (for speech reranking), a maximum entropy classifier, a part-of-speech tagger, a PCFG parser, an...

متن کامل

PCFG Parsing for Restricted Classical Chinese Texts

The Probabilistic Context-Free Grammar (PCFG) model is widely used for parsing natural languages, including Modern Chinese. But for Classical Chinese, the computer processing is just commencing. Our previous study on the part-of-speech (POS) tagging of Classical Chinese is a pioneering work in this area. Now in this paper, we move on to the PCFG parsing of Classical Chinese texts. We continue t...

متن کامل

Parsing German Topological Fields with Probabilistic Context-Free Grammars

Parsing German Topological Fields with Probabilistic Context-Free Grammars Jackie Chi Kit Cheung M. Sc. Graduate Department of Computer Science University of Toronto 2009 Syntactic analysis is useful for many natural language processing applications requiring further semantic analysis. Recent research in statistical parsing has produced a number of highperformance parsers using probabilistic co...

متن کامل

A Probabilistic Context-free Grammar for Disambiguation in Morphological Parsing

One of the major problems one is faced with when decomposing words into their constituent parts is ambiguity: the generation of multiple analyses for one input word, many of which are implausible. In order to deal with ambiguity, the MORphological PArser MORPA is provided with a probabilistic context-free grammar (PCFG), i.e. it combines a "conventional" context-free morphological grammar to fi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006